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B.  INTRODUCTION 

In  clinical  follow-up  studies,  subjects  are  monitored  at  regular  time  intervals  for  a 
physical  condition.  It  is  often  the  case  that  an  event  under  observation  can  take  place  in 
between  two  successive  visits,  and  it  may  not  be  possible  for  the  subject  to  know  the  time 
to  such  an  event  exactly’  For  example,  consider  the  situation  in  which  a  group  of  women 
at  high  risk  for  breast  cancer  is  asked  to  take  a  chemopreventive  substance  for  a  fixed  time 
period.  At  the  end  of  the  period,  each  participating  woman  is  required  to  submit  a  blood 
or  urine  sample  at  regular  intervals  in  order  to  monitor  the  level  of  a  validated  intermediate 
biomarker.  Let  X  denote  the  time  from  cessation  of  use  of  the  agent  to  the  loss  of  its 
protective  effect,  quantified  as  a  return  to  baseline  value  of  the  biomarker.  If  a  woman 
submits  a  sample  for  assay  on  a  daily  basis,  the  value  of  X  can  be  observed  exactly,  unless 
the  protective  effect  is  still  present  by  the  time  the  study  is  terminated  so  that  X  is  right 
censored  in  the  usual  sense  of  survival  analysis.  In  practice,  however,  the  follow-up  interval 
can  be  a  week  or  longer;  therefore  the  exact  value  of  X  is  generally  unknown  but  is  known  to 
lie  between  the  time  points  L  and  R,  where  L  is  the  number  of  days  from  cessation  of  agent 
intake  to  the  last  time  the  sample  was  assayed  and  the  protective  effect  was  still  present,  and 
R  is  the  number  of  days  from  cessation  of  agent  intake  to  the  most  recent  time  the  sample 
was  assayed.  If  the  protective  effect  is  still  present,  then  R  takes  the  value  infinity.  In  any 
case,  when  the  value  of  X  is  only  known  to  lie  between  (L,  R),  we  say  that  X  is  censored  in 
the  interval  {L,R).  Therefore  the  observed  data  consist  of  either  censoring  intervals  (L,R) 
or  exact  observations  X  =  L  =  R. 

We  consider  nonparametric  estimation  of  the  distribution  function  F(t)  of  a  real- valued 
random  variable  X  (or  its  survival  function  S{t)  =  1  —  F{t),  where  F{t)  =P{A’  <  t}),  when 
the  sample  data  are  incomplete  due  to  restricted  observation  brought  about  by  interval 
censoring. 

At  present,  there  are  only  two  estimation  procedures  of  S  for  interval-censored  (IC) 
data  that  are  generalized  maximum  likelihood  estimates  (GMLE)  in  the  sense  of  Kiefer 
and  Wolfowitz  [1].  The  first  one  is  due  to  Peto  [2]  and  makes  use  of  the  Newton- Ralphon 
algorithm.  The  second  is  due  to  Turnbull  [3]  and  makes  use  of  a  self-consistent  algorithm. 
A  solution  to  the  latter  algorithm  is  called  a  self-consistent  estimator  (SCE)  of  S.  In  each 
case,  there  is  no  closed-form  expression  for  the  estimator. 

In  the  first  year  of  our  research,  we  focused  our  attention  on  IC  data  that  satisfy  a 
condition  which  we  called  DI  condition:  IC  data  {Li,Ri}, {Ln,Rn}  are  said  to  satisfy 
DI  condition  if  given  any  two  censoring  intervals,  either  they  are  disjoint  or  one  is  a  subset 
of  the  other.  In  a  clinical  study  in  which  every  subject  has  the  same  follow-up  schedule,  say 
at  time  point  ai,  02,  ...,  au,  then  {L,R}  =  {0,ai},  or  {ai,ai+i}  or  {0^,00},  and  hence  such 
interval-censoring  data  will  satisfy  DI  Condition. 

Under  DI  interval-censorship  model,  we  extended  Efron’s  [4]  redistribution  -to-the-right 
idea  for  right-censored  data  and  proposed  a  redistribution-to-the-inside  method  to  yield 
a  nonparametric  estimator  of  S{t)  which  we  called  redistribution-to-the-inside  estimator 
(RTIE).  Such  an  estimate  has  a  closed-form  expression  and  can  be  readily  calculated  for  IC 
data  of  any  size.  The  availability  of  an  explicit  expression  for  the  RTIE  has  enabled  us  to 
show  that  it  is  the  GMLE  under  DI  condition,  and  to  establish  asymptotic  properties  of  the 
RTIE. 
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More  often  than  not,  IC  data  do  not  satisfy  the  DI  condition.  In  a  clinical  follow-up  sit¬ 
uation,  for  example,  a  patient  may  miss  a  scheduled  appointment.  Therefore,  it  is  necessary 
to  consider  asymptotic  inference  under  more  general  conditions  of  interval  censorship. 
There  are  4  situations  in  which  IC  data  can  occur. 

Case  2  IC  data  (C2  data)  consist  of  right-,  left-  and  strictly  interval-censored  observa¬ 
tions.  Here,  an  observation  is  called  right  censored  if  i?  =  oo,  left  censored  if  L  =  0,  exact 
if  L  =  R  and  strictly  interval  censored  if  0  <  L  <  R  <  oo.  Examples  of  C2  data  can  be 
found  in  [5].  Mixed  IC  data  (MIC  data)  consist  of  both  C2  data  and  exact  observations, 
referred  to  as  partially  IC  data  in  our  second-year  report.  Yu,  Li  and  Wong  [6]  presented 
a  set  of  MIC  data  in  breast  cancer  research.  Doubly-censored  data  (DC  data)  consist  of 
right-,  left-censored  and  exact  observations.  Examples  of  DC  data  can  be  found  in  [7].  Case 
1  IC  data  (Cl  data))  consist  of  right-censored  and  left-censored  observations.  Examples  of 
Cl  data  can  be  found  in  [8]  and  [9]. 

Four  different  interval  censorship  models  corresponding  to  the  four  different  types  of 
IC  data  have  been  proposed.  They  are  the  C2  model,  the  mixture  interval  censorship  model 
(MIC  model),  the  DC  model  and  the  Cl  model. 

To  study  the  asymptotic  properties  of  the  GMLE,  we  make  use  of  the  following  as¬ 
sumptions: 

(ASl)  The  censoring  distribution  is  discrete  but  the  survival  distribution  is  arbitrary. 

(AS2)  The  support  set  of  the  censoring  vector  is  finite,  but  the  survival  distribution  is  arbi¬ 
trary. 

(ASS)  A  probability  restriction.  See  Section  C. 

(AS4)  A  probability  restriction.  See  Section  C. 

(AS5)  The  censoring  distribution  and  the  survival  distribution  are  arbitrary,  but  have  to  satisfy 
some  regularity  conditions. 

In  our  second  year  of  research,  we  established  the  following  important  asymptotic  results 
of  the  GMLE  under  both  DI  and  non-DI  conditions: 

1.  Under  the  Cl  model  or  the  C2  model,  the  GMLE  is  strongly  consistent  under  assump¬ 
tion  (ASl) 

2.  Under  the  Cl  model,  the  GMLE  is  asymptotically  normal  and  efficient  under  Assump¬ 
tion  (ASl). 

3.  Under  the  C2  model,  the  GMLE  is  asymptotically  normal  and  efficient  under  assump¬ 
tion  (AS2) 

4.  Under  the  MIC  model,  both  the  SCE  and  the  GMLE  are  strongly  consistent  under 
Assumption  (AS2). 

5.  Under  the  MIC  model,  both  the  SCE  and  the  GMLE  are  asymptotically  normal  and 
efficient  under  Assumption  (ASl). 

In  our  third  year,  we  established  the  following  important  asymptotic  results  for  the 
GMLE  tmder  both  DI  and  non-DI  conditions: 

6.  Under  the  MIC  model,  the  SCE  and  the  GMLE  are  strongly  consistent  under  Assump¬ 
tions  (ASS). 

7.  Under  the  MIC  model,  the  SCE  and  the  GMLE  are  asymptotically  normal  and  efficient 
under  Assumptions  (ASS)  and  (AS4). 
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8.  Under  the  DC  model,  we  have  proposed  a  modified  GMLE  and  proved  that  the  es¬ 
timator  is  strongly  consistent,  asymptotically  normal  and  efficient  under  Assumption 
(AS5). 

9.  We  have  constructed  an  asymptotic  nonparametric  two-sample  test  procedure  for  all 
types  of  IC  data,  and  applied  it  to  a  breast  cancer  relapse  follow-up  study. 

10.  We  have  proved  the  consistency  of  the  GMLE  of  the  parameters  in  the  proportional 
hazards  model  for  C2  and  MIC  data. 

C.  BODY 

C.l.  Case  1  and  Case  2  Models. 

We  established  consistency  and  asymptotic  normality  of  the  GMLE  under  assumptions 
(ASl)  or  (AS2)  in  our  second  year  of  research. 

In  the  third  year,  we  have  revised  two  manuscripts  pertaining  to  these  results  and  they 
are  now  accepted  by  two  peer-reviewed  statistical  journals  ([10]  and  [11]). 


C.2.  MIC  Model. 

In  the  second  year  of  our  research,  we  proposed  for  MIC  data  the  MIC  model,  which  is 
a  mixture  of  the  C2  interval  censorship  model  and  the  usual  right  censorship  (RC)  model 
(see  Yu,  Li  and  Wong  [6],  [12]  and  [13]).  The  C2  model  assumes  that  X  and  the  random 
censoring  vector  (Y,  Z)  are  independent  and  that  Y  <  Z  with  probability  one.  The  RC 
model  assumes  that  there  is  a  random  censoring  time  T,  which  is  independent  of  X,  and 
the  observable  information  from  the  RC  model  is  {mm{X,T),I{X  <T)).  We  introduce  a 
random  variable  D  to  distinguish  failure  times  coming  from  the  two  models: 


D 


if  the  observation  is  from  the  RC  model 
if  the  observation  is  from  the  C2  model. 


Let  P{JD  =  1}  =  TT,  where  0  <  tt  <  1.  Formally,  a  MIC  data  point  is  regarded  as  an 
observation  from  the  RC  model  with  probability  tt  and  from  the  C2  model  with  probability 
1  —  TT. 

To  express  observed  MIC  data  as  intervals,  we  introduce  a  notation  [L,  R\  defined  as 
follows: 


[L,R\ 


[o,y) 

if 

D  = 

0 

and 

X 

< 

Y 

IY,Z) 

if 

D  = 

0 

and 

Y 

< 

X  <Z 

lZ,co) 

if 

D  = 

0 

and 

X 

> 

z 

(T,oo) 

if 

D  = 

1 

and 

X 

> 

T 

[V,V] 

if 

D  = 

1 

and 

X 

< 

T, 

on.  Let 

{Li 

5  Ri) 

,  2 

1  =  1 

,2, 

. . 

. ,  n  be  i 

[L,R\  are  from  a  mixture  interval  censorship  model  (MIC  model). 

Define  r  =  sup{f;  P{min(A',  T)  <  t}  <  1},  ry  =  sup{i;  P{Y  <  t}  =  0}. 
Tz  =  sup{i;  P{Z  <  4  <  !}•  We  assume  that  t  >tz- 

(ASS)  P{L  =  r}  >  0  if  P{X  <  r)  <  1  and  P{R  =  ry}  >  0  if  P(A  <  ry)  >  0. 
Theorem  1.  Under  assumptions  (AS2)  and  (ASS),  the  SCE  F{x)  satisfies  that 


and 
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limn^oo  SUP3.<^  \F{x)  -  F(a;)|  =  0  a.s. 

To  establish  asymptotic  normality  for  the  SCE,  we  need  an  additional  assumption  on 
the  distribution  function,  namely, 

(AS4)  P{X  G  li  n  /j}  >  0  for  any  two  realizations,  [Li,Ri\  =  R  and  [Lj,Rj\  =  Ij,  of 
[L,  i?J ,  provided  R  fl  Ij  ^0. 

Theorem  2.  Under  Assumptions  (AS2),  (ASS)  and  (AS4),  the  SCE  F{x)  satisfies  that 

for  X  <T,  s/n{F{x)  —  F(x))  — >  a  normal  random  variable  as  n  — >  oo. 

The  above  two  theorems  are  summarized  in  two  separate  papers.  In  the  third  year  of 
our  research,  we  have  revised  these  two  papers  and  they  are  now  accepted  by  two  peer- 
reviewed  journals  ([6]  and  [13]).  Moreover,  in  the  third  year  of  our  research,  we  relaxed  the 
assumptions  in  these  two  theorems  and  prove  the  following  results. 

Theorem  3.  Under  assumption  (ASS),  the  SCE  F{x)  satisfies  that 
limn-^oo  sup^<.^  |F(a;)  -  F(a;)|  =  0  a.s. 

Theorem  4.  Under  Assumptions  (ASS)  and  (ASf),  the  SCE  F{x)  satisfies  that  for  x  <  r, 

y/n{F{x)  —  F{x))  — >  a  normal  random  variable  as  n  —)■  oo. 

These  results  are  extensions  of  the  results  we  obtained  in  [6]  and  [13]  by  deleting 
assumption  (AS2).  A  manuscript  [12]  pertaining  to  these  result  has  been  submitted  for 
publication. 

C.3.  DC  Model. 

We  consider  efficient  estimation  of  a  survival  function  S  of  the  random  variable  X 
with  doubly-censored  data.  The  double  censorship  model  assumes  that  X  and  the  random 
vector  (y,  Z)  are  independent  and  Y  <  Z  with  probability  one,  and  that  X  is  uncensored 
'AY  <  X  <  Z,  right  censored  ‘A  Z  <  X  and  left  censored  'A  X  <Y.  Let  Sz  and  Sy  be  the 
survival  functions  of  Z  and  Y,  respectively,  and  let  K  =  Sy  -  Sz-  Under  the  assumption 

(ASS)  K{x-~)  >  0  for  all  x  such  that  S(x)  <  1  and  S{x—)  >  0, 

we  present  an  example  in  [14]  to  demonstrate  that  the  GMLE  of  S  is  not  asymptotically 
normally  distributed  and  is  not  asymptotically  efficient,  and  we  propose  a  modified  GMLE 
(for  details,  see  [14])  and  establish  the  following  results  for  such  a  modified  estimator. 
Theorem  5.  Under  the  DC  model  and  Assumptions  (ASS)  and  (ASS),  the  modified  GMLE 
F{x)  converges  to  F{x)  a.s.  for  all  x  =  Oi,  i  >  1. 

Theorem  6.  Under  the  DC  model  and  Assumptions  (ASS),  (ASS)  and  (AS4),  the  modified 

GMLE  F  satisfies  ^/n[F{x)  —  F(a;)]  — >  a  random  normal  variable  as  n  ^  oo  for  x  =  Oi. 
These  two  theorems  are  summarized  in  a  paper  [14]  submitted  for  publication. 

C.4.  T>vo-Sample  Nonparametric  Test. 

Based  on  the  asymptotic  results  that  we  have  estabfished  so  far,  we  propose  a  test 
statistic  that  is  represented  as  a  cumulative  weighted  difference  in  the  GMLE  of  distribution 
functions  to  test  for  their  equality.  The  test  statistic  is  applicable  to  all  two  types  of  IC 
data.  We  apply  it  to  a  breast  cancer  relapse  follow-up  study  described  as  follows. 
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An  Example  Three  hundred  and  seventy-four  women  with  stages  I  -  III  unilateral  invasive 
breast  cancer  surgically  treated  at  Memorial  Sloan-Kettering  Cancer  Center  between  1985 
and  1990  were  followed  for  relapse.  The  median  follow-up  duration  was  46  months.  Relapse 
time  was  given  by  the  time  interval  between  surgery  and  the  initial  relapse.  A  relapse  that 
took  place  between  two  successive  follow-up  visits  was  regarded  as  interval  censored.  If  a 
patient  did  not  relapse  towards  the  end  of  the  study,  then  her  relapse  time  was  right  censored. 
Of  the  374  observations,  300  were  right  censored  (no  relapse),  21  were  left  censored  and  53 
were  strictly  interval  censored  (74  relapses). 

Bone  marrow  micrometastasis  (BMM)  was  determined  for  each  woman  at  the  time  of 
surgery.  An  important  clinic  question  is  whether  remission  duration  is  related  to  the  number 
of  BMM  cells  detected.  Figure  1  compares  the  relapse-free  GMLE  curves  of  patients  with 
number  of  BMM  <  14  versus  those  with  number  of  BMM  >  14.  Our  two-sample  asymptotic 
nonparametric  test  yielded,  a  P  value  close  to  0.1.  A  manuscript  [15]  pertaining  to  the 
proposed  asymptotic  two-sample  test  procedure  has  been  submitted  to  a  statistical  journal. 

C.5.  Proportional  Hazards  Model. 

Under  the  restrictive  assumptions  that  both  X  and  the  censoring  vector  take  on  finitely 
many  values,  we  have  proved  that  the  GMLE  of  the  parameters  in  Cox  regression  are 
consistent.  However,  we  have  not  yet  established  asymptotic  normality  for  the  distributions 
of  these  parameter  estimators.  We  have  completed  a  first  draft  on  the  consistency  result  in 
[16]. 

D.  CONCLUSIONS 

In  the  third  year  of  our  DOD  grant,  we  have  essentially  completed  our  research  on  the 
asymptotic  inference  on  the  GMLE  of  the  survival  function  for  IC  data,  including  consis¬ 
tency,  asymptotic  normality  and  asymptotic  efficiency.  The  results  which  we  have  estab¬ 
lished  provide  a  set  of  fundamentally  important  statistical  tools  for  the  analysis  of  most 
types  of  IC  data  that  are  encountered  in  clinical  follow-up  studies.  In  the  fourth  and  final 
year  of  our  DOD  grant,  we  plan  to  pursue  the  following  three  concluding  issues  on  univariate 
IC  data: 

1.  establish  asymptotic  results  without  the  discrete  assumption  on  the  distribution  of  the 
random  censoring  vector, 

2.  construct  counterparts  of  log-rank  tests  for  IC  data  and  derive  their  asymptotic  prop¬ 
erties, 

3.  establish  asymptotic  results  for  Cox  regression  parameters  under  conditions  that  are 
more  relaxed  than  the  requirement  of  finiteness. 

We  should  point  our  that  although  we  are  administratively  on  the  DOD  grant  in  our 
fomth  and  final  year,  we  are  not  receiving  any  funding  from  the  DOD  in  this  year.  We 
will  make  every  effort  to  complete  our  proposed  work  to  bring  our  DOD  funded  research 
on  the  asymptotic  survival  analysis  of  univariate  interval-censored  data  to  a  definitively 
satisfactory  closme. 
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